Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for more flexible control of connection backlog and number of connections accepted "at once" #3062

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ghost
Copy link

@ghost ghost commented Sep 8, 2021

This PR proposes a form for increased flexibility in a) controlling the backlog size for the listening connection(s) a Tornado application ends up using, and b) controlling how many connections are accepted during one iteration of Tornado's I/O loop, something that currently is inherently tied to the default backlog size (hence both changes with a single commit).

Tornado applications can admittedly already effectively control the backlog size applied to their listening socket(s). They may do so through a) importing and altering the value of the _DEFAULT_BACKLOG module variable from netutil (can't do so because default function arguments are assigned during function creation, not invocation), or b) resorting to initializing their listening sockets differently than what the example in the user guide illustrates, e.g. invoking bind oneself.

The second option has two drawbacks. First, it necessitates changing the "simple" application initialization code to a more involved one just to be able to specify own backlog. Second, by changing the code thus, the application will have to resolve and bind each socket itself -- something Tornado otherwise would have done for it "for free" with its very useful bind_sockets. Of course bind_sockets is also part of the API, and may be called as well, but that turns the code into an even more involved variant for initialization.

The first option does not have the aforementioned drawbacks, of course -- the application can be initialized as per the "simple" example in the user guide, having set the imported _DEFAULT_BACKLOG to a value application prefers by default.

More importantly, however, both options make it impossible to have Tornado not specify any backlog argument to listen at all. These proposed changes partially aim at alleviating that limitation. Thing is, a future version of Python is free to change how the "default reasonable value" is chosen for listen, and Tornado, being a framework of increasing adoption, in an increasingly varied number of scenarios, may do well by allowing applications to defer to said default behaviour by Python, instead of insisting on a "magic" number of its own.

So this PR proposes to first and foremost change the default backlog size to None, allowing Tornado to be able to correspondingly omit providing backlog argument to listen and thus defer to a backlog as decided by Python. Of course, specifying an integer value means the value will be passed to listen, as before.

Additionally, due to the default backlog size now potentially being None, the number of connections attempted accepted at one time can no longer simply follow the default backlog size as before. We use this opportunity to add an additional module variable which may be used to control the number [of connections to try and accept], deferring (if None) to default backlog size if the latter is non-zero, or ultimately, 128 (a final fallback for lack of a better magic value).

There is a prior conversation touching on this topic.

This commit is about three following changes:

1. Control of default backlog throughout the lifetime of a Tornado
application, through the now exported (no `_` prefix) module variable.
Previously, by way of how Python initializes function argument defaults,
the default was used once during module initialization (when procedures are
created) which "froze" the default value in the sense there was no way to
affect the default during application lifetime -- typically as
configuration is loaded, for example. To allow application to configure
the default backlog when it has the preferred value available, evaluation
of the default backlog immediately before calling `listen`, is thus
necessary.

2. Deferring to Python deciding on the backlog, through using `None` as
valid value. For Python 3.6, for example, that would imply the minimum
between 128 and SOMAXCONN:
https://github.com/python/cpython/blob/3.6/Modules/socketmodule.c#L3026

3. Because the number of connections Tornado attempted to `accept` with
each iteration of it's I/O loop, was tied to the default backlog value,
because this commit allows for `None` or zero (valid for `listen`) values
for the latter, the heuristics needed to be amended because neither zero
nor `None` would be an acceptable number of connections to accept. This
problem is solved by appropriately adding a separate variable that
controls the number, with fallback through a non-zero default backlog to
128.
DEFAULT_BACKLOG = None

# Number of connections to try and accept in one sweep of an I/O loop;
# if falsy (e.g. `None` or zero) the number will be decided on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments imply that monkey-patching these constants is intended to be a public interface for configuration. I don't think that's a good idea (or at least I'd want to see some more justification for why someone might want to change this so we could think about the most appropriate interface).

I'd rather just hard-code the number of iterations in accept_handler, and get rid of the tornado-side _DEFAULT_BACKLOG (so we call listen() with an argument if a backlog is given, and otherwise use no arguments so that the default is determined by lower levels).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently debugging a case with our usage of tornado where we do want to change this to a much smaller value.

What we have seen is that in multiprocessing pre-fork setups, where we have a large number of workers sharing 1 socket, we see very poor tail latencies under high load. We have narrowed down due to huge amounts of queuing within processing that accidentally accept far too high of a share of the incoming connections. IE while we might have on average 3 async requests in flight per worker, when we see the outliers with high tail latencies hitting timeouts and inspect their io_loops we'll see that 30 requests connections got all accepted at once on that worker, and that 30th request queue up at the end in the worker will have massively increased latency.

We are currently looking at ways to prevent this poor load balancing, and at least one element of it, might be to limit the number of connection accepted at a time, so that if a small queue under high load of pending connections has built up, it's not entirely taken by a single worker, and is better distributed in smaller chunks over multiple workers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. That does sound like a good reason to make the accept-calls-per-loop value configurable (or maybe just to hard-code it to a small value, but I suspect we'd have a hard time agreeing on what that value should be), so the accept-calls-per-loop variable is valuable even aside from its use here for the case when the accept backlog is unspecified/None.

I still don't like monkey-patching module-level constants as the configuration interface, but it seems like whatever interface we come up with should be used for both variables.

In the meantime, have you tried SO_REUSEPORT (and binding the socket after the fork)? My understanding is that the kernel treats this differently from the case of a socket opened before a fork, in a way that generally does a better job at balancing the load across the processes. I don't have any experience with it at scale, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants